Modify int_customer_segments: add loyalty_score and churn risk columns#7
Modify int_customer_segments: add loyalty_score and churn risk columns#7
Conversation
📝 WalkthroughWalkthroughThe pull request modifies an intermediate dbt SQL model to add two new computed columns: Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@dbt/models/intermediate/int_customer_segments.sql`:
- Line 32: int_customer_segments currently never produces the 'Inactive' branch
because lifetime_orders comes from int_customer_order_history (always >=1);
update dim_customers to handle customers with no orders by coalescing segment
metrics from int_customer_segments: wrap days_since_last_order, loyalty_score,
lifetime_orders, and segment (the column produced by int_customer_segments) with
COALESCE and sensible defaults (e.g., COALESCE(lifetime_orders,0),
COALESCE(segment,'Inactive'), COALESCE(days_since_last_order,<default>),
COALESCE(loyalty_score,0)) so downstream churn logic gets deterministic numeric
values instead of NULLs, or alternatively add not_null tests on those columns in
dim_customers if you prefer to enforce only customers-with-orders; modify the
SELECT that joins int_customer_segments in dim_customers accordingly.
- Line 32: Replace the adapter-specific datediff call with dbt's cross-database
macro: change the expression producing days_since_last_order to use {{
dbt.datediff(first_date, second_date, datepart) }} with first_date =
last_order_date, second_date = current_date, and datepart = 'day' (i.e., use {{
dbt.datediff(last_order_date, current_date, 'day') }} as days_since_last_order)
in the model that defines days_since_last_order; also consider whether rows with
last_order_date IS NULL (inactive customers) should be excluded or explicitly
handled after this change.
| -- NEW: Loyalty score based on orders and spend | ||
| (lifetime_orders * 10) + (lifetime_spend * 0.5) as loyalty_score, | ||
| -- NEW: Days since last order for churn risk | ||
| datediff('day', last_order_date, current_date) as days_since_last_order |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find and read the upstream model int_customer_order_history.sql
fd "int_customer_order_history.sql" --exec cat -n {}Repository: zingleai/dataPortal_demo
Length of output: 2702
🏁 Script executed:
#!/bin/bash
# Read the current file to see the full context, especially line 17's case statement
fd "int_customer_segments.sql" --exec cat -n {}Repository: zingleai/dataPortal_demo
Length of output: 3705
🏁 Script executed:
#!/bin/bash
# Check if there's a customers dimension table that might be left joined
rg -l "customer" --type sql | head -20Repository: zingleai/dataPortal_demo
Length of output: 1787
🏁 Script executed:
#!/bin/bash
# Check for any LEFT JOIN patterns in the customer segments or order history models
rg "left join|left outer join" --type sql -i -A 2 -B 2Repository: zingleai/dataPortal_demo
Length of output: 36079
🏁 Script executed:
#!/bin/bash
# Check dim_customers to see how segments are joined
fd "dim_customers.sql" --exec cat -n {}Repository: zingleai/dataPortal_demo
Length of output: 5266
🏁 Script executed:
#!/bin/bash
# Check for data quality tests on customer segments and dimensions
fd "schema.yml" -t f --exec grep -l "int_customer_segments\|dim_customers" {} | xargs catRepository: zingleai/dataPortal_demo
Length of output: 50380
The 'Inactive' segment is unreachable code; days_since_last_order becomes NULL only in downstream models.
Within int_customer_segments, all rows originate from int_customer_order_history, which aggregates from orders using GROUP BY customer_id. This means lifetime_orders is always ≥ 1, making the else 'Inactive' branch at line 17 impossible to reach.
However, dim_customers LEFT JOINs this model to the base customer dimension. Customers without orders will have NULL for all segment columns, including days_since_last_order and loyalty_score. No not_null tests protect these columns in dim_customers, and downstream churn logic consuming numeric fields will operate on NULLs.
Consider either:
- Adding
not_nulltests on segment columns indim_customersto document that only customers with orders are included - Using
coalesce()indim_customerswhen selecting segment metrics to provide defaults for customers without orders - Filtering
dim_customersto only include customers withlifetime_orders > 0
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@dbt/models/intermediate/int_customer_segments.sql` at line 32,
int_customer_segments currently never produces the 'Inactive' branch because
lifetime_orders comes from int_customer_order_history (always >=1); update
dim_customers to handle customers with no orders by coalescing segment metrics
from int_customer_segments: wrap days_since_last_order, loyalty_score,
lifetime_orders, and segment (the column produced by int_customer_segments) with
COALESCE and sensible defaults (e.g., COALESCE(lifetime_orders,0),
COALESCE(segment,'Inactive'), COALESCE(days_since_last_order,<default>),
COALESCE(loyalty_score,0)) so downstream churn logic gets deterministic numeric
values instead of NULLs, or alternatively add not_null tests on those columns in
dim_customers if you prefer to enforce only customers-with-orders; modify the
SELECT that joins int_customer_segments in dim_customers accordingly.
🧩 Analysis chain
🌐 Web query:
dbt.datediff macro signature documentation minimum version
💡 Result:
dbt.datediff macro signature (as implemented in dbt’s built-in cross-db macros)
{{ dbt.datediff(first_date, second_date, datepart) }}The default implementation renders SQL like:
datediff(datepart, first_date, second_date)[1]
Minimum dbt version
dbt.datediff is part of the “cross-db macros” that were migrated from dbt_utils into dbt Core (so you call them as dbt.<macro>). That migration is called out in the dbt-utils v0.9.0 release notes. [2]
In practice, packages that bumped to the dbt-utils v1-era / “use dbt.<macro>” pattern required dbt-core >= 1.3.0. [3]
So: minimum dbt-core version to rely on dbt.datediff is 1.3.0. [2][3]
Sources: [1] [2] [3]
🌐 Web query:
dbt datediff BigQuery Postgres cross-database macro support
💡 Result:
dbt does support a cross-database datediff macro for BigQuery and Postgres—it lives in dbt Core (the dbt. namespace), not dbt_utils, in modern projects.
- Use:
{{ dbt.datediff(first_date, second_date, datepart) }}(and similarly{{ dbt.dateadd(...) }}), and dbt will dispatch to the right SQL per adapter (e.g., BigQuery vs Postgres), handling dialect differences like argument order and function names. [1], [2] - If you upgraded to dbt-utils v1.0+, note that cross-db macros (including
datediff) were removed fromdbt_utilsand moved to dbt Core, so you must switch fromdbt_utils.datedifftodbt.datediff. [3], [4] - This migration of cross-db macros into dbt-core started in dbt-core v1.2.0 (July 26, 2022). [4]
Sources
[1] dbt Community Forum (example + motivation for cross-db datediff) (discourse.getdbt.com)
[2] dbt Community Forum (dispatched macros; dbt_utils.datediff history + adapter dispatch concept) (discourse.getdbt.com)
[3] dbt-utils README (states cross-db macros removed from dbt-utils v1.0 and points to dbt Core docs) (github.com)
[4] dbt-core v1.2.0 release notes (move cross-db macros from dbt-utils into dbt-core global project) (newreleases.io)
Citations:
- 1: https://discourse.getdbt.com/t/building-dbt-models-to-be-compatible-with-multiple-data-warehouses/402?utm_source=openai
- 2: https://discourse.getdbt.com/t/prerelease-v0-18-0-marian-anderson/1545?utm_source=openai
- 3: https://github.com/dbt-labs/dbt-utils?utm_source=openai
- 4: https://newreleases.io/project/github/dbt-labs/dbt-core/release/v1.2.0?utm_source=openai
🏁 Script executed:
# Search for dbt.datediff usage in the repository to understand context
cd /dev/null && echo "Checking for dbt core version hints in repo"
find . -name "dbt_project.yml" -o -name "packages.yml" -o -name "requirements.txt" -o -name "pyproject.toml" | head -20Repository: zingleai/dataPortal_demo
Length of output: 219
🏁 Script executed:
# Check dbt version requirement in the repository
cat dbt/dbt_project.yml | head -30Repository: zingleai/dataPortal_demo
Length of output: 612
🏁 Script executed:
# Find and check int_customer_order_history.sql for last_order_date column
find . -name "int_customer_order_history.sql" -type fRepository: zingleai/dataPortal_demo
Length of output: 276
🏁 Script executed:
# Check the int_customer_order_history.sql file for last_order_date column
cat -n dbt/models/intermediate/int_customer_order_history.sqlRepository: zingleai/dataPortal_demo
Length of output: 944
🏁 Script executed:
# Check the int_customer_segments.sql file to see current state
cat -n dbt/models/intermediate/int_customer_segments.sql | head -50Repository: zingleai/dataPortal_demo
Length of output: 1522
Use {{ dbt.datediff() }} cross-database macro instead of raw datediff() syntax.
The syntax datediff('day', last_order_date, current_date) is specific to Snowflake, Redshift, and Databricks. BigQuery uses DATE_DIFF() with different argument order, and Postgres has no native datediff() function. This will fail or produce incorrect results on any adapter that doesn't support this exact signature.
dbt provides a cross-database macro {{ dbt.datediff(first_date, second_date, datepart) }} (available in dbt-core >= 1.2.0) that handles adapter-specific SQL generation. Your repository requires dbt >= 1.0.0, which supports this macro.
🔧 Proposed fix
- datediff('day', last_order_date, current_date) as days_since_last_order
+ {{ dbt.datediff("last_order_date", "current_date", "day") }} as days_since_last_orderNote: For Inactive customers (where lifetime_orders = 0), the last_order_date will be NULL, resulting in NULL for days_since_last_order. Consider whether this behavior is acceptable for churn risk analysis, or if these rows should be excluded or handled explicitly.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| datediff('day', last_order_date, current_date) as days_since_last_order | |
| {{ dbt.datediff("last_order_date", "current_date", "day") }} as days_since_last_order |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@dbt/models/intermediate/int_customer_segments.sql` at line 32, Replace the
adapter-specific datediff call with dbt's cross-database macro: change the
expression producing days_since_last_order to use {{ dbt.datediff(first_date,
second_date, datepart) }} with first_date = last_order_date, second_date =
current_date, and datepart = 'day' (i.e., use {{ dbt.datediff(last_order_date,
current_date, 'day') }} as days_since_last_order) in the model that defines
days_since_last_order; also consider whether rows with last_order_date IS NULL
(inactive customers) should be excluded or explicitly handled after this change.
Summary
Modified the int_customer_segments intermediate model to add two new columns: loyalty_score (weighted score based on orders and spend) and days_since_last_order (for churn risk analysis).
Tags:
customer-analytics,churnCriticality:
P2Models (1)
int_customer_segmentsLineage
Data Quality Tests (1)
Generated by Data Portal
Summary by CodeRabbit